{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "view-in-github"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/langroid/langroid/blob/main/examples/kg-chat/DependencyChatbot.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "M0zyjyKDE_0p"
   },
   "source": [
    "\n",
    "<img width=\"700\" src=\"https://raw.githubusercontent.com/langroid/langroid/main/docs/assets/langroid_neo4j_logos.png\" alt=\"Langroid\">\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4o6uFZwWko7C"
   },
   "source": [
    "# Overview\n",
    "\n",
    "🔥 for those curious about leveraging the power of LLM and knowledge graph in the software supply security domain.\n",
    "In this colab, we unveil the **Dependency Chatbot**, an LLM-powered application, equipped with a suite of specialized tools. It harnesses the power of Neo4j knowledge-graph and LLM for:\n",
    "\n",
    "* crafting queries in Neo4j's native language,\n",
    "* constructing detailed dependency graphs via DepsDev API,\n",
    "* searching the web for broader web-based insights.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zpFtWFn8K-Ui"
   },
   "source": [
    "# Motivation: Software Supply Chain Security\n",
    "\n",
    "This is a rapidly growing field, especially in light of the significant increase in software supply chain attacks. It focuses primarily on understanding and managing the dependencies in your software supply chain. With the rise of open-source and third-party components in software development, the need for supply chain security has become more critical than ever. Organizations are now realizing the importance of vetting and monitoring the components and dependencies they rely on to ensure the integrity and security of their software. As this field continues to evolve, it will be essential for developers and organizations to stay proactive in addressing supply chain vulnerabilities and implementing robust security measures.\n",
    "\n",
    "Managing dependencies starts with the ability to identify direct and transitive dependencies. Normally, this involves obtaining the full dependency graph, and writing custom code to answer questions about dependencies. In this colab, we introduce a far simpler approach with 2 key innovations:\n",
    "- store the dependency graph in a graph-db, specifically neo4j,\n",
    "- use an LLM-powered Agent that translates a user's questions into the query language of neo4j (known as Cypher)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "rLgfXQq7DDMJ"
   },
   "source": [
    "# PyPi Package Dependency Chatbot\n",
    "\n",
    "This application combines the power of LLM and Knowledge Graphs (KG) to create a Retrieval-Augmented Generation (RAG) application for improved understanding of dependencies.\n",
    "\n",
    "This application focuses on PyPi packages and relies on [DepsDev](https://deps.dev/) to obtain the dependencies for a given package. More details about this Chatbot can be found [HERE](https://github.com/langroid/langroid/tree/main/examples/kg-chat).\n",
    "\n",
    "## Dependency Chatbot Architecture\n",
    "\n",
    "![Arch](https://github.com/langroid/langroid/blob/main/docs/assets/DepChatbot.png?raw=true)\n",
    "\n",
    "The chatbot comprises one agent `Neo4jChatAgent` that has access to three tools:\n",
    "\n",
    "1.   `GraphSchemaTool`: to get schema of Neo4j knowledge-graph.\n",
    "2.   `CypherRetrievalTool`: to generate cypher queries to get information from Neo4j knowledge-graph (Cypher is the query language for Neo4j).\n",
    "3.   `DepGraphTool`: to build the dependency graph for a given pkg version, using the API at [DepsDev](https://deps.dev/).\n",
    "4.   `GoogleSearchTool`: to find package version and type information. It also can answer other question from the web about other aspects after obtaining the intended information from the dependency graph.\n",
    "\n",
    "\n",
    "\n",
    "## Workflow\n",
    "The Dependency Chatbot's workflow is as follows:\n",
    "\n",
    "\n",
    "1.   The chatbot asks the user to provide the package name.\n",
    "2.   The chatbot tries to identify the version and verify this package is PyPi.\n",
    "3.   The user confirms the package details.\n",
    "4.   The chatbot will construct the dependency graph of the package including transitive dependencies.\n",
    "5.   At this stage, the user can ask the chatbot any question about the dependency graph, such as:\n",
    "  *   What are the packages at level 2?\n",
    "  *   Tell me 3 interesting things about the dependency graph?\n",
    "6.   For some questions that the chatbot can't answer from the the graph, it can use a web search tool to obtain additional information. For example, to identify the package version, the chatbot will use the web search tool.\n",
    "\n",
    "\n",
    "\n",
    "## Implementation\n",
    "We developed this application using the following tools/APIs:\n",
    "\n",
    "*   [Langroid](https://github.com/langroid/langroid): a framework for developling LLM applications.\n",
    "*   [Neo4j](https://neo4j.com/): a graph database management system.\n",
    "*   [Cypher Query Language](): graph query language that lets you retrieve data from the graph. It is like SQL for graphs.\n",
    "*   [DepsDev](https://deps.dev/): Open Source Insights is a service developed and hosted by Google to help developers better understand the structure, construction, and security of open source software packages.\n",
    "\n",
    "\n",
    "## Required environment settings:\n",
    "\n",
    "Before proceeding with the implementation, ensure that you have the necessary environment settings and keys in place.\n",
    "\n",
    "*   `OPENAI_API_KEY`\n",
    "*   GoogleSearchTool requires two keys:\n",
    "    *   `GOOGLE_API_KEY`: [setup a Google API key](https://developers.google.com/custom-search/v1/introduction#identify_your_application_to_google_with_api_key),\n",
    "    *   `GOOGLE_CSE_ID`: [setup a Google Custom Search Engine (CSE) and get the CSE ID](https://developers.google.com/custom-search/docs/tutorial/creatingcse)\n",
    "*    NEO4J ENV:\n",
    "    *   `username`: typically neo4j\n",
    "    *   `password`: your-neo4j-password\n",
    "    *   `uri`: uri-to-access-neo4j-dayabase\n",
    "    *   `database`: typically neo4j\n",
    "\n",
    "    These Neo4j settings will be requested later in this colab\n",
    "    \n",
    "    ```python\n",
    "    neo4j_settings = Neo4jSettings(\n",
    "      uri=\"\",\n",
    "      username=\"neo4j\",\n",
    "      password=\"\",\n",
    "      database=\"neo4j\",\n",
    "    )\n",
    "    ```\n",
    "\n",
    "**NOTE:** You can setup a free account at [Neo4j Aura](https://neo4j.com/cloud/platform/aura-graph-database/) to get access to Neo4j graph database.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "aNbeze7LNiQa"
   },
   "source": [
    "## Install, setup, import"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "k_wFJ06tA_8t"
   },
   "outputs": [],
   "source": [
    "# Silently install Langroid, suppress all output (~2-4 mins)\n",
    "!pip install -q --upgrade langroid &> /dev/null"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "XorXx9GbPITC"
   },
   "outputs": [],
   "source": [
    "# Silently install Neo4j, suppress all output\n",
    "!pip install -q langroid[neo4j] &> /dev/null"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "TmcOOLLeQC1t"
   },
   "source": [
    "## Environment settings\n",
    "\n",
    "This code will ask the user to provide the `OPENAI_API_KEY`, `GOOGLE_API_KEY`, and `GOOGLE_CSE_ID`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "7T_R8_HWQShi"
   },
   "outputs": [],
   "source": [
    "# OpenAI API Key: Enter your key in the dialog box that will show up below\n",
    "# NOTE: colab often struggles with showing this input box,\n",
    "# if so, simply insert your API key in this cell, though it's not ideal.\n",
    "import os\n",
    "from getpass import getpass\n",
    "\n",
    "os.environ['OPENAI_API_KEY'] = getpass('Enter your OPENAI_API_KEY key:', stream=None)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "v0qMEBY9XYK2"
   },
   "outputs": [],
   "source": [
    "# Google keys for the web search tool\n",
    "os.environ['GOOGLE_API_KEY'] = getpass('Enter your GOOGLE_API_KEY key:', stream=None)\n",
    "os.environ['GOOGLE_CSE_ID'] = getpass('Enter your GOOGLE_CSE_ID key:', stream=None)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "z5spJbxjXPKv"
   },
   "outputs": [],
   "source": [
    "# various unfortunate things that need to be done to\n",
    "# control notebook behavior.\n",
    "\n",
    "# (a) output width\n",
    "\n",
    "from IPython.display import HTML, display\n",
    "\n",
    "\n",
    "def set_css():\n",
    "  display(HTML('''\n",
    "  <style>\n",
    "    pre {\n",
    "        white-space: pre-wrap;\n",
    "    }\n",
    "  </style>\n",
    "  '''))\n",
    "get_ipython().events.register('pre_run_cell', set_css)\n",
    "\n",
    "# (b) logging related\n",
    "import logging\n",
    "\n",
    "logging.basicConfig(level=logging.ERROR)\n",
    "import warnings\n",
    "\n",
    "warnings.filterwarnings('ignore')\n",
    "import logging\n",
    "\n",
    "for logger_name in logging.root.manager.loggerDict:\n",
    "    logger = logging.getLogger(logger_name)\n",
    "    logger.setLevel(logging.ERROR)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "mJPl4mJ4Sg4r"
   },
   "outputs": [],
   "source": [
    "from langroid.agent.special.neo4j.neo4j_chat_agent import (\n",
    "  Neo4jChatAgent,\n",
    "  Neo4jChatAgentConfig,\n",
    "  Neo4jSettings,\n",
    ")\n",
    "from langroid.agent.task import Task\n",
    "from langroid.agent.tool_message import ToolMessage\n",
    "from langroid.agent.tools.google_search_tool import GoogleSearchTool\n",
    "from langroid.language_models.openai_gpt import OpenAIChatModel, OpenAIGPTConfig\n",
    "from langroid.utils.constants import NO_ANSWER"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Smezh1PUG3DD"
   },
   "source": [
    "## Define the tools"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 17
    },
    "id": "O_nbZciITsYq",
    "outputId": "0e4e00e1-0f92-40dc-adfa-d1b3d9234207"
   },
   "outputs": [],
   "source": [
    "# Define the tool `DepGraphTool` that will construct the dpendency graph\n",
    "# and answer user's questions\n",
    "class DepGraphTool(ToolMessage):\n",
    "    request = \"construct_dependency_graph\"\n",
    "    purpose = f\"\"\"Get package <package_version>, <package_type>, and <package_name>.\n",
    "    For the <package_version>, obtain the recent version, it should be a number.\n",
    "    For the <package_type>, return if the package is PyPI or not.\n",
    "      Otherwise, return {NO_ANSWER}.\n",
    "    For the <package_name>, return the package name provided by the user.\n",
    "    ALL strings are in lower case.\n",
    "    \"\"\"\n",
    "    package_version: str\n",
    "    package_type: str\n",
    "    package_name: str\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "SZMj3KFJTzHx"
   },
   "outputs": [],
   "source": [
    "# Defining the class of the `DependencyGraphAgent`\n",
    "class DependencyGraphAgent(Neo4jChatAgent):\n",
    "    def construct_dependency_graph(self, msg: DepGraphTool) -> None:\n",
    "        check_db_exist = (\n",
    "            \"MATCH (n) WHERE n.name = $name AND n.version = $version RETURN n LIMIT 1\"\n",
    "        )\n",
    "        response = self.read_query(\n",
    "            check_db_exist, {\"name\": msg.package_name, \"version\": msg.package_version}\n",
    "        )\n",
    "        if response.success and response.data:\n",
    "            # self.config.database_created = True\n",
    "            return \"Database Exists\"\n",
    "        else:\n",
    "            construct_dependency_graph = CONSTRUCT_DEPENDENCY_GRAPH.format(\n",
    "                package_type=msg.package_type.lower(),\n",
    "                package_name=msg.package_name,\n",
    "                package_version=msg.package_version,\n",
    "            )\n",
    "            if self.write_query(construct_dependency_graph):\n",
    "                self.config.database_created = True\n",
    "                return \"Database is created!\"\n",
    "            else:\n",
    "                return f\"\"\"\n",
    "                    Database is not created!\n",
    "                    Seems the package {msg.package_name} is not found,\n",
    "                    \"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "3-JuJ_rBRWse"
   },
   "outputs": [],
   "source": [
    "# CONSTRUCT_DEPENDENCY_GRAPH is the Cypher query that will be used for constructing the dependency graph\n",
    "CONSTRUCT_DEPENDENCY_GRAPH = \"\"\"\n",
    "        with \"{package_type}\" as system, \"{package_name}\" as name, \"{package_version}\" as version\n",
    "\n",
    "        call apoc.load.json(\"https://api.deps.dev/v3alpha/systems/\"+system+\"/packages/\"\n",
    "                            +name+\"/versions/\"+version+\":dependencies\")\n",
    "        yield value as r\n",
    "\n",
    "        call {{ with r\n",
    "                unwind r.nodes as package\n",
    "                merge (p:Package:PyPi {{name: package.versionKey.name, version: package.versionKey.version}})\n",
    "                return collect(p) as packages\n",
    "        }}\n",
    "        call {{ with r, packages\n",
    "            unwind r.edges as edge\n",
    "            with packages[edge.fromNode] as from, packages[edge.toNode] as to, edge\n",
    "            merge (from)-[rel:DEPENDS_ON]->(to) ON CREATE SET rel.requirement\n",
    "            = edge.requirement\n",
    "            return count(*) as numRels\n",
    "        }}\n",
    "\n",
    "        match (root:Package:PyPi) where root.imported is null\n",
    "        set root.imported = true\n",
    "        with \"{package_type}\" as system, root.name as name, root.version as version\n",
    "        call apoc.load.json(\"https://api.deps.dev/v3alpha/systems/\"+system+\"/packages/\"\n",
    "                            +name+\"/versions/\"+version+\":dependencies\")\n",
    "        yield value as r\n",
    "\n",
    "        call {{ with r\n",
    "                unwind r.nodes as package\n",
    "                merge (p:Package:PyPi {{name: package.versionKey.name, version: package.versionKey.version}})\n",
    "                return collect(p) as packages\n",
    "        }}\n",
    "        call {{ with r, packages\n",
    "                unwind r.edges as edge\n",
    "                with packages[edge.fromNode] as from, packages[edge.toNode] as to, edge\n",
    "                merge (from)-[rel:DEPENDS_ON]->(to) ON CREATE SET\n",
    "                rel.requirement = edge.requirement\n",
    "                return count(*) as numRels\n",
    "        }}\n",
    "        return size(packages) as numPackages, numRels\n",
    "        \"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ER3SGX_pLKkM"
   },
   "source": [
    "## Define the dependency agent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "wAeZ_-SwTBzb"
   },
   "outputs": [],
   "source": [
    "# We also need to provide Neo4j environment variables before defining the `dependency_agent`\n",
    "neo4j_settings = Neo4jSettings(\n",
    "    uri=\"\",\n",
    "    username=\"neo4j\",\n",
    "    password=\"\",\n",
    "    database=\"neo4j\",\n",
    ")\n",
    "\n",
    "dependency_agent = DependencyGraphAgent(\n",
    "        config=Neo4jChatAgentConfig(\n",
    "            neo4j_settings=neo4j_settings,\n",
    "            use_tools=True,\n",
    "            use_functions_api=False,\n",
    "            llm=OpenAIGPTConfig(\n",
    "                chat_model=OpenAIChatModel.GPT4_TURBO,\n",
    "            ),\n",
    "        ),\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "wRdR2EAaKSWH"
   },
   "source": [
    "## Define the task"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "gZ1QADohUH9N"
   },
   "outputs": [],
   "source": [
    "# Define the dependency task that will orchestrate the work for the `dependency_agent`\n",
    "system_message = f\"\"\"You are an expert in Dependency graphs and analyzing them using\n",
    "    Neo4j.\n",
    "\n",
    "    FIRST, I'll give you the name of the package that I want to analyze.\n",
    "\n",
    "    THEN, you can also use the `web_search` tool/function to find out information about a package,\n",
    "      such as version number and package type (PyPi or not).\n",
    "\n",
    "    If unable to get this info, you can ask me and I can tell you.\n",
    "\n",
    "    DON'T forget to include the package name in your questions.\n",
    "\n",
    "    After receiving this infomration, make sure the package version is a number and the\n",
    "    package type is PyPi.\n",
    "    THEN ask the user if they want to construct the dependency graph,\n",
    "    and if so, use the tool/function `construct_dependency_graph` to construct\n",
    "      the dependency graph. Otherwise, say `Couldn't retrieve package type or version`\n",
    "      and {NO_ANSWER}.\n",
    "    After constructing the dependency graph successfully, you will have access to Neo4j\n",
    "    graph database, which contains dependency graph.\n",
    "    You will try your best to answer my questions. Note that:\n",
    "    1. You can use the tool `get_schema` to get node label and relationships in the\n",
    "    dependency graph.\n",
    "    2. You can use the tool `retrieval_query` to get relevant information from the\n",
    "      graph database. I will execute this query and send you back the result.\n",
    "      Make sure your queries comply with the database schema.\n",
    "    3. Use the `web_search` tool/function to get information if needed.\n",
    "    \"\"\"\n",
    "\n",
    "task = Task(\n",
    "    dependency_agent,\n",
    "    name=\"DependencyAgent\",\n",
    "    system_message=system_message,\n",
    ")\n",
    "\n",
    "dependency_agent.enable_message(DepGraphTool)\n",
    "dependency_agent.enable_message(GoogleSearchTool)\n",
    "task.set_color_log(enable=False)\n",
    "task.run()"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "include_colab_link": true,
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
